Smoothing and compression with stochastic k-testable tree languages

نویسندگان

  • Juan Ramón Rico-Juan
  • Jorge Calera-Rubio
  • Rafael C. Carrasco
چکیده

In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A probabilistic extension of locally testable tree languages

Probabilistic k-testable models (usually known as k-gram models in the case of strings) can be easily identified from samples and allow for smoothing techniques to deal with unseen events. In this paper we introduce the family of stochastic k-testable tree languages and describe how these models can approximate any stochastic rational tree language. This is applied, as a particular case, to the...

متن کامل

K-TLSS(S) language models for speech recognition

The class of K-Testable Languages in the Strict Sense (K-TLSS) is a subclass of regular languages. Previous works demonstrate that stochastic K-TLSS language models describe the same probability distribution as N-gram models, and that smoothing techniques can be e ciently applied (Back-o like methods). Once we have a set of k-TLSS models (k = 1 : : :K) and a smoothing technique that specificall...

متن کامل

Learning k-Testable tree sets from positive data

A k-Testable tree set in the Strict sense (k-TS) is essentially defined by a finite set of patterns of "size" k that are permitted to appear in the trees of the tree language. Given a positive sample S of trees over a ranked alphabet, an algorithm is proposed which obtains the smallest k-TS tree set containing S. The proposed algorithm is polynomial on the size of S and identifies the class of ...

متن کامل

Stochastic K-TSS Bi-Languages for Machine Translation

One of the approaches to statistical machine translation is based on joint probability distributions over some source and target languages. In this work we propose to model the joint probability distribution by stochastic regular bi-languages. Specifically we introduce the stochastic k-testable in the strict sense bi-languages to represent the joint probability distribution of source and target...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2005